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CsJ Abstract 

o 

£vq In this paper, we develop a novel design framework for energy-efficient spectrum sharing in cognitive 

radio networks, where autonomous primary users and secondary users aim to minimize their average 

O 

energy consumptions subject to minimum throughput requirements. Most existing works proposed sta- 
tionary spectrum sharing policies, in which the users transmit at fixed power levels. Since the users 
transmit simultaneously under stationary policies, to fulfill the minimum throughput requirements, they 
^ j need to transmit at high power levels due to multi-user interference. To improve energy efficiency, we 

construct nonstationary spectrum sharing policies, in which the users transmit at time-varying power 

CO 

O levels. Specifically, we focus on TDMA (time-division multiple access) policies in which only one user 

transmits at each time (but they may not transmit in a round-robin fashion). Due to the absence of multi- 
user interference and the ability to let users adaptively switch between transmission and dormancy, the 
£■ — proposed policy greatly improves the spectrum and energy efficiency of stationary policies, and ensures 

^-j- no interference to primary users. In addition, the proposed policy has the following desirable properties. 

I First, the policy achieves high energy efficiency even when the users have erroneous and binary feedback 

r—i 

^vq about the interference and noise power levels at their receivers. Second, it allows users to enter and leave 

__j 

the system without affecting the throughput and energy efficiency of the users in the network. Third, the 
policy is deviation-proof, namely autonomous users will find it in their self-interests to follow it. Fourth, 
it can be implemented by autonomous users in a decentralized manner. Compared to state-of-the-art 
policies, the proposed policies can achieve an energy saving of up to 80% when the number of users is 
large or the multi-user interference is strong. 



I. Introduction 

A key challenge associated with cognitive radio networks is determining efficient solutions for sec- 
ondary users (SUs) to share the spectrum with primary users (PUs) without degrading PUs' quality of 
service (QoS) [1]. The spectrum sharing policies, which specify the PUs' and SUs' transmission schedules 
and transmit power levels, are essential to achieve spectrum and energy efficiency [2j. Research on 
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designing spectrum sharing policies can be roughly divided in two main categories. The research in the 
first category formulates the spectrum sharing problem as a utility maximization problem subject to the 
users' maximum transmit power constraints ll3l- |[T0l . Many works in this category El-||7] define the utility 
function as an increasing function of the signal-to-interference-and-noise-ratio (SINR), while neglecting 
to consider the energy consumption of the resulting spectrum sharing policies. Some other works in this 
category (H-[10] define the utility function as the ratio of throughput to transmit power, in order to 
maximize the spectrum efficiency per energy consumption. Research in the second category lfTTl - |[T8l 
formulates the spectrum sharing problem as an energy consumption minimization problem subject to the 
users' minimum throughput requirements. In this formulation, the users' throughput requirements can be 
explicitly specified. Hence, the spectrum efficiency is guaranteed with the minimal energy consumption. 
The work in this paper pertains to this second category of research works. 

One major limitation of most existing works lfTTI - lfT8l is that they restrict attention to a simple class of 
spectrum sharing policies that require the users to transmit at fixed power levels as long as the environment 
(e.g. the number of users, the channel gains) does not change^ We call this class of spectrum sharing 
policies stationary. The stationary policies are not energy efficient, because due to multi-user interference, 
the users need to transmit at high power levels to fulfill the minimum throughput constraints. To improve 
energy efficiency, we study nonstationaryn spectrum sharing policies. Specifically, we focus on TDMA 
(time-division multiple access) spectrum sharing policies, a class of nonstationary policies in which the 
users transmit in a TDMA fashion. TDMA policies can achieve high spectrum efficiency that is not 
achievable under stationary policies, and greatly improve the energy efficiency of the stationary policies, 
because of the following two reasons. First, there is no multi-user interference in TDMA policies. Second, 
TDMA policies allow users to adaptively switch between transmission and dormancy, depending on the 
average throughput they have achieved, for the purpose of energy saving. Note that in the optimal 
TDMA policies we propose, users usually do not transmit in the simple round-robin fashion, because of 



the heterogeneity in their minimum throughput requirements and channel conditions (see Section IV for 
a motivating example that shows the sub-optimality of round-robin TDMA policies). 

'Although some spectrum sharing policies |11|-[18| go through a transient period of adjusting the power levels before 
converging to the optimal power levels, the users maintain fixed power levels after the convergence. 

2 We use "nonstationary", instead of "dynamic", to describe the proposed policy, because "dynamic spectrum sharing" has been 
extensively used to describe general spectrum sharing policies in cognitive radio, where SUs access the channel opportunistically. 
In this sense, our policy is dynamic. However, our nonstationary policy is different from other dynamic spectrum sharing policies, 
in that the power levels are time-varying. We will provide more detailed comparisons with existing works in the next section. 
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Another limitation of existing works in the second category lflTI - lfT7l (with few exceptions such as 
[18]) is the lack of consideration for the fundamental requirement in cognitive radio networks: protection 
of PUs' QoS. PUs' QoS can be protected by imposing interference temperature (IT) constraints. Each 
PU's receiver estimates the local interference temperature (i.e. the interference and noise power level), 
feedback it to its transmitter for power control, and if the IT constraint is violated, broadcasts a distress 
signal to SUs for interference control ifTBl . However, in practice, PUs cannot perfectly estimate the 
interference temperature, and can only send limited (quantized) feedback. Hence, it is important to design 
spectrum sharing policies that are robust to the erroneous and limited feedback. Although some work 
lfT8l considers IT constraints for PUs' QoS protection, none of existing works iTTTl — lfT8ll considers the 
erroneous estimation and limited feedback of interference temperature. 

In this paper, we provide a novel design framework to construct TDMA spectrum sharing policies 
that achieve PUs' and SUs' minimum throughput requirements with minimal energy consumptions, 
under erroneous and very limited (only binary) feedback. The proposed policy can be easily extended 
to the network in which PUs/SUs enter and leave, without affecting the users' spectrum and energy 
efficiency. Moreover, the proposed policy is deviation-proof, meaning that a user cannot improve its 
energy efficiency over the proposed policy while still fulfilling the throughput requirement. In this way, 
autonomous users will find it in their self-interest to adopt the policy. We provide two approaches, 
one completely decentralized and the other partially decentralized, to implement the proposed policy, 
depending on whether there is a spectrum server/mediator as assumed in [181[30l- ll33l . Without such 
an entity, the users implement the policy in a completely decentralized manner using the first approach. 
Alternatively, the users can let the spectrum server/mediator, if it exists, to share some communication 
and computational overhead by collecting information and determining the optimal operating point before 
run-time, and then implement the policy in a decentralized manner in the run time. 

The rest of the paper is organized as follows. We give detailed comparisons against existing works in 



Section [II] Section [HI] describes the system model for spectrum sharing. Section IV gives a motivating 
example to show the performance gain achieved by nonstationary policies and the necessity of deviation- 
proof policies. In Section [Vj we formulate and solve the policy design problem. In section VI we extend 
our framework in several directions, among which we consider the case where users enter and leave the 



network. Simulation results are presented in Section VII Finally, Section VIII concludes the paper. 
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TABLE I 

Comparisons against stationary spectrum sharing policies. 





Energy-efficient 


Deviation-proof 


Feedback (Overhead) 


User number 


CD 


No 


No 


Error-free, unquantized (Large) 


Fixed 


mm 


No 


Against stationary policies 


Error-free, unquantized (Large) 


Fixed 


m en 


Yes 


Against stationary policies 


Error-free, unquantized (Large) 


Fixed 




Yes 


Against stationary policies 


Error-free, unquantized (Large) 


Varying 




No 


Against stationary and nonstationary policies 


Error-free, unquantized (Large) 


Fixed 


Proposed 


Yes 


Against stationary and nonstationary policies 


Erroneous, binary (One-bit) 


Varying 



II. Related Works 

First, we want to mention that only few works [18] study the energy consumption minimization problem 
with minimum throughput requirements in cognitive radio networks. However, we compare against a broad 
class of related works to highlight our differences. 

A. Stationary Spectrum Sharing Policies 

Table [I] categorizes existing stationary spectrum sharing policies based on four criteria: whether the 
policy considers energy efficiency, whether the policy is deviation-proof (against stationary or nonsta- 
tionary policies), what are the feedback requirements and the corresponding overhead, and whether they 
can accommodate a fixed or varying number of users. Throughout this section, feedback is defined as 
any information (e.g. interference and noise power levels) sent from a user's receiver to its transmitter. 

Note that we put lTT9l -[21 1 in the category of stationary policies, although they design policies in a 
repeated game framework. This is because in the equilibrium where the system operates, the policies in 
|[T9l - ll2Tl use fixed power levels. This is in contrast with E2l . which uses time-varying power levels at 
equilibrium and is categorized as nonstationary policies in the next subsection. 

B. Nonstationary Spectrum Sharing Policies 

We summarize the major differences between the existing nonstationary policies and our proposed 
policy in Table [n] Now we briefly discuss the major limitations of the existing nonstationary policies. 

1) Nonstationary Policies Based on Repeated Games: The major limitation of the works based on 
repeated games ll22l is the assumption of perfect monitoring, which requires error-free and unquantized 
feedback. Erroneous and limited feedback is assumed in |[23l . However, |[23l requires that the amount 
of feedback increases with the number of power levels that the users can choose. In contrast, we only 
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TABLE II 

Limitations on existing nonstationary spectrum sharing policies. 





Energy-efficient 


Power control 


Users 


Feedback (Overhead) 


Deviation-proof 


User number 


(22) 


No 


Yes 


Heterogeneous 


Error-free, unquantized (Large) 


Yes 


Fixed 


ED 


No 


Applicable 


Heterogeneous 


Erroneous, limited (Medium) 


Yes 


Fixed 


(24) 


No 


No 


Homogeneous 


Erroneous, binary (One-bit) 


No 


Fixed 


(23 


Yes 


No 


Homogeneous 


Erroneous, binary (One-bit) 


No 


Fixed 


(26) (2D 


No 


No 


Homogeneous 


Error-free, binary (One-bit) 


No 


Fixed 


Proposed 


Yes 


Yes 


Heterogeneous 


Erroneous, binary (One-bit) 


Yes 


Varying 



require binary feedback regardless of the number of power levels, which significantly reduces the feedback 
overhead. 

2) Nonstationary Policies Based on MDP: Many works developed optimal nonstationary policies 
based on Markov decision processes (MDP) (see representative works [24] [25 ]). However, most of the 
approaches based on MDP solve only single-user decision problems, and cannot be easily extended to 
the case where multiple users compete for a single resource. 

3) Nonstationary Policies Based on Multi-arm Bandit: Nonstationary policies based on multi-arm 
bandit (MAB) have been proposed in [261- ll28l . First, |[26l - |[28l focus on channel selection problems 
without considering power control, while our work focuses on power control problems. In addition, 
|[26l -[281 assumed that the users are homogeneous, while in our work, we consider heterogeneous users. 
Moreover, Il26l - ll28l did not consider the case where users are entering and leaving the network. Finally, 
the policies in |[26l - ll28l are not deviation-proof. 

C. Comparison With Our Previous Work 

In this subsection, we summarize the differences between this work and our previous work ||29l , which 
proposed a design framework for optimal nonstationary spectrum sharing polices. 

First, the design frameworks are significantly different because their design objectives and goals are 
different. In ||29l , we aimed to design TDMA spectrum sharing policies that maximize the users' total 
throughput without considering energy efficiency. Under this design objective, each user will transmit 
at the maximum power level in its slot, as long as the IT constraint is not violated. Hence, what we 
optimized is the transmission schedule of the users only. In this work, since we aim to minimize the energy 
consumption subject to the minimum throughput requirements, we need to optimize both the transmission 
schedule and the users' transmit power levels, which makes the design problem more challenging. 
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Now we explain the differences in the design frameworks in details, which will also be illustrated later 
in Fig. [3] Both design frameworks include three steps: characterization of the set of feasible operating 
points, selection of the optimal operating point, and the distributed implementation of the policy. The 
fundamental difference is in the first step, which is the most important step that characterizes the feasible 
operating points. In |29l , since each user transmits at the maximum power level in its slot, we know that 
the set of feasible operating points lies in the hyperplane determined by each user's maximum achievable 
throughput. Hence, we only need to determine which portion of this particular hyperplane is achievable. 
On the contrary, in this work, since the users may not transmit at the maximum power levels in their slots, 
the feasible operating points lie in a collection of hyperplanes, each of which goes through the vector of 
minimum throughput requirements. Hence, it is more difficult to characterize the set of feasible operating 
points in this work. Due to the more complicated characterization of the feasible operating points, the 
selection of the optimal operating point (the second step) also becomes a more complicated optimization 
problem in this work (although we can prove that it can be converted to a convex optimization problem 
under reasonable assumptions). In summary, in this work, the first two steps in the design framework are 
fundamentally different from those in [29], and are more challenging. 

Both design frameworks have similar third steps: given the optimal operating point obtained in the 
second step, each user runs a simple and intuitive algorithm that achieves the optimal operating point 
in a decentralized manner. However, in this work, we further take the advantage of the simplicity and 
intuition of the algorithm, and extend it to the scenario in which PUs/SUs enter and leave the network. 
This makes the proposed work more robust to the user dynamics compared to the framework in ll29l . 

In this work, we also address other practical considerations that are not considered in [29]. First, we 
assume that there are multiple PUs, instead of a single PU as in |f29l . Second, we include the PUs' 
power control problem in the design framework, in order to improve the energy efficiency of the PUs. In 
contrast, in ll29l . we assumed an IT constraint for the PU and did not optimize the PU's power control 
problem. The optimization of PUs' power control studied in this work is extremely important when there 
are multiple PUs, which may cause large interference to each other if their power control is not optimized. 

Finally, we extend our results to the case in which users are not selfish. Although the policies are 
not deviation-proof any more, we can achieve better performance, because the set of achievable Pareto- 
optimal operating points is larger when we drop the incentive constraints of the users. In 11291 . we did 
not discuss the case of obedient users. 
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Fig. 1 . An example system model with two primary users (transmitter-receiver pairs 1 and 2) and a secondary user (transmitter- 
receiver pair 3). The solid line represents a link for intended data transmission, the dotted line represents the interference from 
another user, and the dashed line indicates a link for distress signals sent from a PU to a SU. 



III. System Model 

A. Model For Spectrum Sharing in Cognitive Radio Networks 

We consider a cognitive radio network that consists of M primary users and N secondary users 
transmitting in a single frequency channel (see Fig. [T] for an examplary system model). The set of PUs 
and that of SUs are denoted by M = {1,2, . . . , M} and N = {M+l, M+2, . . . ,M + N}, respectively. 
Each usei^has a transmitter and a receiver. The channel gain from user i's transmitter to user j's receiver 
is gij. Each user i chooses its power level pi from a compact set Vi C M + . We assume that £ V%, 
namely user i can choose not to transmit. The set of joint power profiles is denoted by V = Il^fa 'Pu 
and the joint power profile of all the users is denoted by p = (pi, . . . ,pm+n) G V. Let p_; be the 
power profile of all the users other than user i. Each user i's throughput is a function of the joint power 
profile, namely T{ : V — > M+. Since the users cannot jointly decode their signals, each user i treats the 
interference from the other users as noise, and obtains the following throughput at the power profile p: 

r<(p) = log 2 ( 1 + ^ ^ ■ (1) 

where af is the noise power at user i's receiver. 

We define user i's local interference temperature Ij(p_i) as the interference and noise power level 
at its receiver, namely Ij(p_j) = J2jeMuJV j^iPjdji + °f- We assume that each user i measures the 
interference temperature with errors. The estimate of l{ is li = U + ei, where £j is the additive estimation 
error with a probability distribution function f £ . known to user i. Each user z's receiver quantizes Ii 

3 We refer to a primary user or a secondary user as a user in general, and will specify the type of users only when necessary. 
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before feedback it to the transmitter. The quantization function is written as Qi : R — > Qi with Qi being 
a finite set of reconstruction values. Given the estimate user i's receiver sends the reconstruction value 
Qi(Ii) to its transmitter. 

In this paper, we assume that each user's receiver uses an unbiased estimator such that E £i { jj(p_j)} = 
Ij(p_i) for any p_j, where E £! {-} is the expectation over £j, and a simple two-level quantizer that 
preserves the mean value of jj(p_j) when there is no multi-user interference. In other words, when 
p_i = (i.e. 7j(p_j) = of), the quantizer should satisfy E £t {Qi(ii(p^i)\ p _ t=0 )} = E Ei {/j(p_ i )| p _ i=0 }, 
and thus satisfy E £ .{Qj(Ij(p_j)| p _ != o)} = ^«(0) = of- An example two-level quantizer can be 

0<(Ji(p_<)) = I UJ - ,Vp-i£V\Vi, (2) 

^ Li = f x - a * &npp{ f H)> x<6i x • feXx ~ af)dx, otherwise 

where supp(/ e J is the support of the distribution function f £i , and Q{ is the quantization threshold. In 

practice, it is easy to implement an unbiased estimator and a simple two-level quantizer in Q. As we 

will show later, such an estimator and a quantizer are sufficient to achieve the optimal performance. 

Remark 1: Here is an intuition why an unbiased estimator and a two-level quantizer in ([2]) are good 

enough for us. For user i to achieve a minimum throughput r^, given the feedback Qi(Ji), its transmit 

power level pi should be pi = (2 ri — 1) • Qi(Ii)/gu. In a TDMA policy, there is no multi-user interference 

(i.e. p i = 0) when user % transmits. Hence, using an unbiased estimator that satisfies E £i {/j} = Jj, and 

the quantizer in (|2]) which satisfies ¥, £ .{Qi(Ii)} = E £s {/j} when p_j = 0, user i's expected transmit 

power level is 

E £i {pi} = E £i {(2 r > - 1) • Q(Ii)/ gil } = (2 r - - l)E £i {Q(ii)}/ gi i = (2 r > - l)o?/g u , (3) 

which is exactly the transmit power level when user i perfectly knows the interference temperature of. 
In contrast, under a non-TDMA policy, there is multi-user interference. In this case, one user's erroneous 
and quantized feedback affects its own transmit power level, which in turn affects the other users' transmit 
power levels through the interference caused by this user. Thus, all the users' transmit power levels are 
coupled through the interference under estimation and quantization errors. Hence, an unbiased estimator 
and a simple two-level quantizer in ([2]) may result in performance loss under non-TDMA policies. 

Since each user i adopts a two-level quantizer, its feedback from the receiver to the transmitter is 
binary. Then we can further reduce the feedback overhead as follows. Each user i's receiver informs 
its transmitter of the two reconstruction values Jj and 1^ only once, at the beginning, after which the 
receiver sends a signal, probably in the form of a simple probe, only when the estimated interference 
temperature Ij exceeds the quantization threshold 0j. The event of receiving or not receiving the probing 
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signal, which is sent only when Ij > 0j, is enough to indicate user i's transmitter which one of the two 
reconstruction values it should choose. Since the probing signal indicates high interference temperature, 
we call it the distress signal as in [12J[18|. With some abuse of definition, we denote user i's distress 
signal as yi G Y = {0, 1} with yi = 1 representing the event that user i's distress signal is sent (i.e. 
Ii > 8{). We write pj(y ? |p) as the conditional probability distribution of user i's distress signal yi given 
power profile p, which is calculated as 

Pi(Vi = l|p) = / IsA x )dx, and pi{y { = 0|p) = 1 - pi(yi = l|p). (4) 

Jx>e i -i i (p- i ) 

Similar to 1161- 11171 . we assume, until Section [VTJ that the system parameters, such as the number of 



users, remain fixed during the considered time horizon. The system is time slotted at t = 0, 1, 2, . . .. We 
assume as in ll6l- |[T8l that the users are synchronized. At the beginning of time slot t, each user i chooses 
its transmit power p\, and achieves the throughput rj(p'). At the end of time slot t, each user j who 
transmits {p l - > 0) sends its distress signal y l - = 1 if the estimate Ij exceeds the threshold 9j. We define 
y as the distress signal, indicating whether there exists a user who has sent its distress signal, namely 

1, if 3j s.t. pj > and yj = 1 

(5) 

0, otherwise 

The conditional distribution is denoted p(y\p), which is calculated as p(y = 0|p) = Hj :P:j> oPj(yj = 0|p). 



B. Spectrum Sharing Policies 

In a general spectrum sharing policy, each user should determine its transmit power level at each time 
slot t based on all the available information: the history of its own transmit powers up to time t, the 
history of its interference temperature up to time t, and the history of the distress signals up to time 
t. However, the computational complexity of such a policy is high. In this paper, we focus on a class 
of low-complexity spectrum sharing policies, in which each user i determines the transmit power level 
p\ based only on the history of distress signals. The history of distress signals at time slot t > 1 is 
h l = {y°; • • • ; y* -1 } G Y t , and that at time slot is hP = 0. Then each user i's strategy 7Tj is a mapping 
from the set of all possible histories U^gY* to its action set Vi, namely iti : U^ Q Y t — > V%. We define 
the spectrum sharing policy as the joint strategy profile of all the users, denoted by tv = (iri, . . . , ttm+n)- 
Hence, user i's transmit power level at time slot t is determined by p\ = ir^h*), and the users' joint 
power profile is determined by p* = 7r(fo*). We can classify all the spectrum sharing policies into two 
categories, stationary and nonstationary policies, as follows. 
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Definition 1: A spectrum sharing policy tv is stationary if and only if for all i G M, for all t > 0, 
and for all /i* G F*, we have 7Tj(/i*) = j>| tat , where pf at £ V% is a constant. A spectrum sharing policy 
is nonstationary if it is not stationary. 

To further simplify the computational complexity of the spectrum sharing policy, we restrict our 
attention to a special class of nonstationary polices, namely the TDMA policies with fixed power levels 
defined as follows. 

Definition 2 (TDMA policies with fixed power levels): A spectrum sharing policy 7r is a TDMA policy 
if at most one user transmits in each time slot. A spectrum sharing policy tv is a TDMA policy with 
fixed power levels, if it is a TDMA policy, and each user i chooses the same power level pJ DMA g p i 
when it transmits. 

A TDMA policy with fixed power levels is completely specified by each user i's transmit power level 
pTDMA w h en j t transmits and by the schedule of which user transmits at each time slot t. Hence, such 
a policy can be relatively easily constructed by the designer and implemented by the users. Since we 
focus on this special class of policies, we refer to "TDMA policy with fixed power levels" as "TDMA 
policy" in the rest of the paper. 

Remark 2: In the formal definition of a nonstationary policy, it seems that each user needs to keep 
track of the history of all the past distress signals to determine the transmit power at each time slot. 



However, as we will see from the algorithm in Table III that implements the proposed policy, each user 
only needs a finite memory. 

C. Definition of Spectrum and Energy Efficiency 

The spectrum and energy efficiency of a spectrum sharing policy are characterized by the users' average 
throughput and average energy consumption, respectively. A user's average throughput is defined as the 
expected discounted average throughput per time slot. Assuming as in lfT9l - ll23l that all the users have 
the same discount factor 5 G [0, 1), user i's average throughput is 



Ri(7v) = (1-5) 



t=i y'-igy 

where p° is determined by p° = tv(0), and p* for t > 1 is determined by p* = 7r(/i*) = 7r(h t ^ 1 ;y t ^ 1 ). 
Similarly, user i's average energy consumption is the expected discounted average transmit power per 
time slot, written as 



Pifr) = (1-5) 
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Each user i aims to minimize its average energy consumption Pi(ir) while fulfilling a minimum 
throughput requirement Rf 1111 . From one user's perspective, it has the incentive to deviate from a given 
spectrum sharing policy, if by doing so it can fulfill the minimum throughput requirement with a lower 
average energy consumption. Hence, we can define deviation-proof policies as follows. 

Definition 3: A spectrum sharing policy tv is deviation-proof if for all i € M U N, we have 

7Tj = argminPj^, 7T_j), subject to _Rj(7^,7r_i) > Rf 1111 , (6) 
where 7r_j is the joint strategy profile of all the users except user i. 

IV. Motivation For Deviation-proof Dynamic Spectrum Sharing Policies 

Before formally describing the design framework, we provide a motivating example to show the ad- 
vantage and necessity of deviation-proof TDMA policies. Consider a simple network with two symmetric 
users. For simplicity, the direct channel gains are both 1, and the cross channel gains are both a > 0, 
i.e., ga = 1 and g±j = a Vi and Vj ^ i. The noise at each user' receiver has the same power a 2 . Both 
users' minimum throughput requirements are r. We first show that a simple round-robin TDMA policy is 
more energy-efficient than the optimal stationary policy, and that the optimal TDMA policy outperforms 
round-robin TDMA policies. Finally, we demonstrate the necessity of deviation-proof TDMA policies. 

If the users adopt the stationary spectrum sharing policy, to fulfill minimum throughput requirements, 
their minimum transmit power should be pf at = p| tat = i-(2~-l) a ' a ' 2, ^ ne avera § e energy consumptions 
are then P? tat = pf at ,i = 1,2, which increase with the cross interference level a. Moreover, the stationary 
policy is infeasible when a > ^f^i > namely when the cross interference level or the minimum throughput 
requirement is very high. 

Now suppose that the users adopt a simple round-robin TDMA policy, in which user 1 transmits at 
a fixed power level pJ DMA in even time slots t = 0,2,... and user 2 transmits at a fixed power level 
pTDMA j n t j me s j ots t = 1,3, . . .. The users' average throughput are 

Ri = (i-s)-jr ^ i og2 (i + P ™ A /^ 2 ) = io g2 (i + p? dma /- 2 ) , 

t=0 

OO r 

R 2 = (l-S)-J2 ^ 2t+1 log 2 (1 + PJ DM V 2 ) = ^- log 2 (1 + p™ A /a 2 ) . 
t=o 

Given their minimum throughput requirements r, we can calculate pJ DMA and p™ MA from the above 
equations, and obtain their average energy consumptions as 

pTDMA = (!_,$). J2Zo S 2t pJ BMA = fig {2 r ( 1+ ^ - 1) , 

pTDMA = (!_£). J2Zo 5 2m p™ MA = fj| (2 r ( 1+ i) - l) . 
November 26, 2012 DRAFT 
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Fig. 2. The system parameters under which it is beneficial for at least one user to deviate. 



Note that, as opposed to the stationary policy, the average transmit power in the round-robin TDMA 
policy is independent of the cross interference level. Hence, the round-robin TDMA policy is better 
under medium to high interference levels, the scenarios in which the stationary policy may not even be 
feasible. For example, when the minimum throughput requirement is r = 1 and the discount factor is 
5 = 0.9, the round-robin TDMA policy is more energy efficient when a > 0.34. 

Under the same parameters (i.e. r = 1 and 6 = 0.9), the optimal TDMA policy that achieves r = 1 
with the minimum total average energy consumption is not a round-robin TDMA policy. The transmission 
schedule of the first few time slots is "1221 1221 12. . . ", which seems to follow an irregular pattern (instead 
of a round-robin fashion). We will show how to construct the optimal TDMA policy in Section [Vj whose 



performance will be evaluated under different system parameters in Section VII 

Even if a TDMA policy is already energy-efficient, a user may want to deviate from it to achieve 
higher energy efficiency. We derive the conditions under which it is beneficial for a user to deviate from 
a given policy in the following lemma. 

Lemma 1: Suppose that under a given TDMA policy, user i transmits at power level p\ at time t and 
user j transmits at power level at time t + s, where t, t + s > and s / 0. Then regardless of the 
discount factor 5, user j can deviate by transmitting in both time slot t and t + s to achieve at least the 
same throughput with a lower average energy consumption, if and only if p t J +s gjj > p\g%j- 

Proof: See Appendix [A] ■ 
From the above lemma, we can see that user j has the incentive to deviate when gjip\ is small, namely 
the interference from user i is small, and when p 1 ^ is large, namely user j's required throughput is high. 
For the same network with two symmetric users discussed previously in this section, Fig. [2] shows the 
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range of minimum throughput requirements and cross interference levels under which it is beneficial for 
at least one user to deviate from the round-robin TDMA policy described. We demonstrate two scenarios 
with different noise powers. We can see that under a wide range of parameter values, the users have 
incentive to deviate, which demonstrates the importance of deriving deviation-proof policies. 

V. A Design Framework For Spectrum and Energy Efficient Policies 

In this section, we first formulate the policy design problem for spectrum and energy efficient spectrum 
sharing and outline the procedure to solve it. Then we show in detail how to solve the design problem 
for the optimal TDMA policy and how to implement the optimal policy. 

A. Formulation of The Design Problem 

The goal of the spectrum manager is to come up with a deviation-proof TDMA policy that fulfills 
all the users' minimum throughput requirements and optimizes certain energy efficiency criterion. The 
energy efficiency criterion can be represented by a function defined on all the users' average energy 
consumptions, E{P\{it), . . . , Pm+n{^))- An example of energy efficiency criterion can be the weighted 
sum of all the users' energy consumptions, i.e. E(Pi(it), . . . , Pm+n(t)) = SiexuAf Wi ' ^*( 7r ) wml 
w i > and J2ieMuAf Wi = ^ Each user i's weight W{ reflects the importance of this user. For example, 
we could set higher weights for PUs and lower weights for SUs. Given each user i's minimum throughput 
requirement Rf 1 ™, we can formally define the policy design problem as 

min E{P 1 {-k),...,P m+n {tt)) (7) 
s.t. tv is a deviation — proof TDMA policy, 
Ri(7v)>Rf n , VieXuA/". 

We outline the proposed design framework to solve the policy design problem (illustrated in Fig. [3}, 
which consists of three steps. First, we characterize the set of feasible operating points that can be achieved 
by deviation-proof TDMA policies. Then, given this set, we select the optimal operating point based on 
the energy efficiency criterion. Finally, we construct the deviation-proof TDMA policy to achieve the 
optimal operating point. In the following, we will describe these three steps in details. 

B. Solving The Policy Design Problem 

1 ) Characterize the set of feasible operating points: The first step in solving the design problem (|7]) is 
to quantify the set of feasible operating points that can be achieved by deviation-proof TDMA policies. 
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Fig. 3. The design framework to solve the policy design problem. The feasible operating points lie in different hyperplanes (red 
dash lines) that go through the vector of minimum throughput requirements (the blue square). This results in the key difference 
from the design framework in 1291 Fig. 3]. In 1291 . all the feasible operating points lie in one hyperplane. 



Specifically, we define the operating point of a TDMA policy as r = (fi, . . . , tm+n), which is a collection 
of each user i's instantaneous throughput fj when it transmits. Since there is no multi-user interference in 
a TDMA policy, each user i's operating point is fj = log 2 (l + pJ DMA gu/af) . Alternatively, given the 
operating point r, we can determine the users' transmit power levels p TDMA = (pJ BMA ^ ^tdma-j ^ e 
sometimes write the users' transmit power levels in a TDMA policy as a function of the operating point, 
i.e. p TDMA (r) = (pJ DMA (ri), . . . jpJP^^^m+n))- A feasible operating point is defined as follows. 

Definition 4 (Feasible Operating Points): An operating point r is feasible (for the minimum through- 
put requirements {Rf 11 * 1 }^//) if there exists a deviation-proof TDMA policy n that satisfies 

• each user i's power level is vrj(/i*) = pJ DMA (fi), V/i* such that 7Tj(/i*) > 0; 

• each user i achieves its minimum throughput requirement, i.e. Ri(iv) = R™ m . 

Note that whether a deviation-proof TDMA policy can fulfill the minimum throughput requirements 
depends not only on the power levels p TDMA (r), but also on the schedule of transmission. 

Before quantifying the set of feasible operating points, we define the benefit from deviation as follows. 

Definition 5 (Benefit from Deviation): We define user j's benefit from deviation from interfering with 
user i's transmission as 

p(y = lip 1 ) — p(y = MPj,plj) 
bij = sup _. J —, (8) 

p 3 eVj , Pj ^p) r j (Pj , P-j ) I r j 

where p l = (p™ (fi), P-i = 0) is the joint power profile when user i transmits in a TDMA policy. 



November 26, 2012 



DRAFT 



15 



As we will see in Theorem [T] if the operating point r can be achieved by deviation-proof policies, the 
benefit from deviation fry for all % and j ^ i must be strictly smaller than 0. Since the throughput rj is 
always larger than 0, bij < is equivalent to p(y = l|p_j-,p^_-) > p(y = l|p J ) for all pj / pj, which 
means that the probability of the distress signal (which indicates deviation) increases when deviation 
happens. This guarantees that any deviation from p* by user j can be statistically identified. We can 
observe that the benefit from deviation is also related to the throughput user j obtains by deviation, 
rj(pj,p l _j). If the throughput obtained by deviation is smaller, the benefit from deviation is smaller. 

Now we state Theorem [T] which characterizes the set of feasible operating points. 

Theorem 1: An operating point r is feasible for the minimum throughput requirements {Rf 1111 } ieMuN '> 
if the following conditions are satisfied: 

• Condition 1: benefit from deviation bij < 0,Vi,Vj ^ i. 

• Condition 2: the discount factor 5 satisfies 5 > d = 1/ N _ 1+ ^ ~ 



where p. = maxj^j 1 - . 



. Condition 3: EieMvM R T ta /n = and n < «f7^- 

Proof: Due to space limit, we only outline the main idea of the proof (illustrated in Fig. [4]). Please 
refer to [?, Appendix B] for the complete proof. 

The proof heavily replies on the concept of self-generating sets |[34l . Simply put, a self-generating set 
is a set in which every payoff is an equilibrium payoff [34]. Given the vector of minimum throughput 
requirements (the blue square in Fig. [4]), we first find an operating point r, namely a collection of 
throughput vectors, whose convex hull includes the vector of minimum throughput requirements (see the 
red dots as an operating point and the dotted red line as the convex hull). Then we identify the largest 
self-generating set (the green line segment) in the convex hull. If the self-generating set includes the 
vector of minimum throughput requirements, we say the operating point is feasible. 

In the theorem, Conditions 1 and 2 are both sufficient conditions for the self-generating set to exist 
for a given operating point r. Since the boundary of the largest self-generating set is {/J-^ieMuAf, 
Condition 3 ensures that the vector of minimum throughput requirements is in the self-generating set. 
Hence, Conditions 1-3 are the sufficient conditions for an operating point to be feasible. ■ 

Theorem [T] provides the sufficient conditions for the existence of feasible operating points. Condition 1 
ensures that when user i transmits, any other user j has no incentive to interfere. Condition 2 specifies 
the lower bound for the discount factor. Through the analytical expression we obtained, we know that 
the lower bound 5 is increasing in the user number, and decreasing in p(y = l|pj). It is important to 
know how this lower bound varies with system parameters, because given the users' applications (S), the 
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Fig. 4. The illustration of the proof of Theorem [TJ 



designer can determine how many users can be allowed in the system, as well as how to set the threshold 
such that p(y = l|pi) is large enough. When Conditions 1 and 2 are both satisfied, Condition 3 actually 
gives us the set of feasible operating points under given system parameters. We can choose any point 
satisfying Condition 3 as the feasible operating point. 

2 ) Select the optimal operating point: Given the set of feasible points obtained in Theorem [T] we 
need to select the optimal operating point r* based on the energy efficiency criterion E{). The following 
proposition formulates the problem of finding the optimal operating point. 

Proposition 1: The optimal operating point r* can be solved by the following optimization problem 



ar 



gmin£(Pi(f), . . . , P M +N(f)), subject to £ Rf^/n = l,n< Pf lin / p., (9) 

i&MUAf 



where P,(r) 



-P 



TDMA 



(rj). In particular, when P(Pi, . . . , Pm+n) is jointly convex in Pi, ... , Pm+n, 



the above optimization problem is convex. 

Proof: See [?, Appendix C]. ■ 
3 ) Construct the optimal deviation-proof policy: Given the optimal operating point obtained in the 



second step, each user i runs the algorithm in Table III in a decentralized manner, and achieves its 



minimum throughput requirements. The resulting policy is deviation-proof, in that if a user does not 
follow the algorithm, it will either achieve a lower average throughput or achieve the same average 
throughput with a higher energy consumption. 

As discussed before, a TDMA policy is specified by the users' transmit power levels and the trans- 
mission schedule. Once the optimal operating point r* is selected, the transmit power levels p TDMA (f*) 
are determined. Hence, the key part of the algorithm is to determine the transmission schedule. One 
one hand, the transmission schedule can be simply summarized as: the user farthest away from the 
optimal operating point transmits. On the other hand, it is nontrivial to define the "distance" from the 
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TABLE III 

The algorithm run by each user i. 



Require: Normalized optimal operating points {i?™ ln /r* }jeMuM and its own operating point r* 

Initialization: Sets t = 0, rj(0) = Rf in /r* for all j G M U AT. 

repeat 

if users enter or leave the network then: Updates according to TableyA end if 
Calculates the distance from the optimal operating point dj(t) = . , , J p(y = l|p J ), Vj 
Finds the user with the largest distance i* = arg max j e A4uAf dj(t) 
if i = i* then 

Transmits at power level p™ MA (f*) 
end if 

Updates r'j (t + 1) for all j £ M U N 

if No Distress Signal Received At Time Slot t then 

r'At + 1) = r'At) - (| - 1)^3^(1 - r'At)), r$(f + 1) = ^.(t) • [l+ (i - 1) • ^^y] ,Vj / i 
else 

rj.(t+i) = K*(t). = »-K*)»yj / j* 

end if 
t 1 
until 



optimal operating point. As we will prove later, user j's distance from the optimal operating point can 
be denned as dj(t) = l_ r ,^ p(y = 1 1 p- 7 ) . Observe that the distance is increasing with r'-{t), which is 
the normalized throughput to achieve starting from time slot t. Hence, the larger the future throughput 
r'j(t) to fulfill, the further a user is away from the optimal operating point. 

The intuition behind the algorithm is as follows. The key to the success of the algorithm is to make sure 
that the vector of future throughput r' (t) lies in the self-generating set (see Fig. |4]) for all t. The sufficient 
conditions in Theorem [j] ensure that this is possible, if we choose the future throughput appropriately 
as in the algorithm. The way we choose the future throughput influences how each user's distance from 
the optimal operating point is updated, which has the following intuitive interpretation. In each time slot, 
if user i* transmits, its distance will decrease in the next time slot, and the other users' distances will 
increase. In this way, the other users have higher opportunities to transmit in the next time slot. However, 
when the users receive the distress signal, which implies deviation, the distances do not change such 
that user i* transmits again in the next time slot. Hence, a user does not have the incentive to deviate, 
because the deviation leads to a smaller opportunity to transmit in the future. 
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Fig. 5. Illustration of implementation: an initial information exchange phase followed by a decentralized implementation phase. 



Theorem [2] ensures that if all the users run the algorithm in Table III locally, they will achieve the 
minimum throughput requirements {i?f im }iexuA/'> an d will have no incentive to deviate. 



Theorem 2: If each user i £ M. U M runs the algorithm in Table III then each user i can achieve 
its minimum throughput requirement i?f m with an energy consumption Pj that minimizes the energy 
efficiency criterion E(Pi, . . . ,Pm+n)- The policy implemented by the algorithm is deviation-proof: if a 
user does not follow the algorithm, it will either fail to achieve the minimum throughput requirement, or 
achieve it with a higher energy consumption. 

Proof: See [?, Appendix D]. ■ 



C. Implementation 

Our proposed design framework can be implemented in two phases as illustrated in Fig. [5] an initial 
information exchange phase in which the optimal operating point is calculated, followed by a decentralized 



implementation phase in which users run the algorithm in Table III in a completely decentralized manner. 
In the following, we first specify what information needs to be exchanged in the initial information 
exchange phase. Then we show that the total overhead of initial information exchange and feedback of 
the proposed framework is much smaller than those of existing works. Finally, we propose two approaches 
to perform the initial information exchange. One approach can be performed by the users in a completely 
decentralized manner, while the other is performed in a partially decentralized manner by the users and 
a local spectrum server (LSS) lfT8lll30l - |[33l , who helps to reduce the users' overhead in the process. 



1 ) Overhead of initial information exchange and feedback: In Table IV we compare the overhead 
of initial information exchange and feedback of the proposed framework with that of [18], which is the 
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TABLE IV 

Comparison of the total overhead of initial information exchange and feedback. 





Initial information exchange 


Feedback in the run time 


ED 


LSS to each user i: degradation of its minimum throughput 
requirement 

Amount: M + TV real numbers 


Each user i: 7_j in each time slot, LSS and each PU: distress 
signal when necessary 

Amount: M + TV real numbers in each time slot, a distress 
signal (possibly just a probe) when necessary 


Proposed 

(without 
LSS) 


Each user i broadcasts to all the other users: p(y = l|p l ), 
Rf>\ and {bji}^ 

Amount: (M + TV) 2 + (M + TV) real numbers 


Each user i: once, distress signal when necessary 

Amount: 2(M + TV) real numbers once, a distress signal 
(possibly just a probe) when necessary 


Proposed 

(with 
LSS) 


Each user i to LSS: p(y = l[p»), i?™ in , and {bji}^, LSS 
to each user i: {p(y = l\p*)}j^i, r*, and {Rf™ /r*j}j& 
Amount: (M + TV) 2 + (M + AT) real numbers 


Each user i: once, distress signal when necessary 

Amount: 2(Af + TV) real numbers once, a distress signal 
(possibly just a probe) when necessary 


ED-ED 


N/A 


Each user i: I—i each time slot 

Amount: M + TV real numbers in each time slot 


ED ED 


N/A 


Each user i: p in each time slot 

Amount: (M + TV) 2 real numbers at each time slot 



only work that addresses energy efficient spectrum sharing in cognitive radio networks. In the initial 
information exchange phase, the proposed framework has an additional overhead of (M + N) 2 compared 
to lfl"8ll . This additional overhead mainly comes from the information exchange of by, which is used 
for deviation-proofnes^] However, in the run time, the feedback overhead of the proposed policy is 
significantly lower than that of [18]. Specifically, each user's receiver feedback the reconstruction values 
Jj and Jj only once with a total overhead of 2{M + N). In contrast, in |fl8l . each user i's receiver needs 
to feedback the interference temperature J_j in each time slot. Hence, the total amount of feedback in 
|fl"8l grows with time. In conclusion, our proposed framework has a much lower total overhead than |[T8l . 



At the end of Table IV we also highlight the feedback overhead of other different works, although they 
are not proposed for the energy-efficient power control problem in cognitive radios. lflT1 -|[T7i] propose 
energy-efficient stationary spectrum sharing policies in cellular or ad hoc networks. Since there is no 
differentiation of PUs and SUs in lflTI - lfr7l . no initial information exchange such as that in lfl"8l is needed. 
However, the feedback overhead at the run time is the same as in |[T8l . which is much larger than that 

4 We will see later in Table |vi| that for obedient users, the overhead of initial information exchange in the proposed framework 
is M + TV, which is the same as in 1181. 
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in the proposed framework. 11191 - 11221 design nonstationary policies in cellular or ad hoc networks under 
the framework of repeated games with perfect monitoring. Each user feedback the individual transmit 
power level of all the users (how to obtain this information is not discussed in lPT9l -[220 in each time 
slot, which requires the largest amount of feedback among all the related works. 

2) Perform initial information exchange: The initial information exchange is used to gather enough 
information required to solve for the optimal operating point, which serves as the input to the algorithm 
in Table [TTT] implemented by the users in a completely decentralized manner. Depending on whether there 
is a LSS or not, the users can perform the initial information exchange in a completely decentralized 
manner or in a partially decentralized manner with the aid of the LSS. In these two approaches, the 
optimization problem ([9]) will be solved by each user and by the LSS, respectively. 

In the completely decentralized approach, each user i can broadcast the information needed (specified 
in Table [TV]) over a common control channel as in lfT9lll20l . or through an initialization protocol [35 ] . We 
briefly describe the initialization protocol in [35 ], which assumes no prior knowledge of the user number 
or user indices for each user and is particularly suitable for cognitive radio networks. [35 ] proposed a MAC 
protocol with an initialization protocol, in which all the users learn the number of users and their indices 
in a decentralized fashion, and have opportunities to convey information to the other users. Specifically, 
in the initialization protocol, the users first randomize over transmission and dormancy to compete for a 
time slot. With some probability there is only one user transmits, who becomes the "winner". The winner 
then conveys some information through some predefined pattern of "transmit" and "idle" to let the other 
users know its success. By counting the number of winners and observing the order of the winners, the 
users can learn the number of users and assign an index to each one of them, respectively. We can extend 
the framework in [35], such that the winner conveys more information, such as its minimum throughput 
requirement Rf 1111 , p(y\\p l ), and b% y 

If there exists a local spectrum server as assumed in [18|[30l- ll33l . the initial information exchange 
can be performed jointly by the users and the LSS (specified in Table [TV] ). The LSS can reduce the 
communication and computational overhead of the users. First, the users only communicate with the 
LSS, which means that they only need to be able to decode the messages sent by the LSS, instead of 
separate and decode the messages sent by all the other users. Second, after gathering all the information, 
the LSS can solve the optimization problem (|9]) for the optimal operating point, while in the first approach 
each user needs to solve it by itself. The disadvantage of this second approach, however, is the requirement 
of an additional infrastructure (i.e. the LSS). 
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TABLE V 

The procedure to update the parameters in the algorithm. 



Require: set of current PUs M(t), set of current SUs Af(t), current normalized operating points {f^(i)}»e.M(t)u/V'(t)> 
if a user leaves the network then 

if the user is PU then 

M(t) «- M(t)\{M(t)} 

else 

end if 

Update users' operating points r-(t) <- 1/ (l2jeM(t)uM(t) r K*)) ' r i(*) for a11 * G .M(t) UJV(t) 
else if a user enters the network then 
if the incoming user is PU then 

M{t)<-M(t)U{M{t) + l} 

Assign the incoming user its index M(t) and its normalized operating point r' M t t \(t) 
Update SUs' operating points r'^t) <- r'^t) - for all i G Af(t) 

else the incoming user is SU then 
Af(t) Af(t) u {N(t) + 1} 

Assign the incoming user its index N(t) and its normalized operating point r' N ^(t) 
Update the existing SUs' operating points r-(i) <- rj(t) - for all i G A/"(t) \ {N(t)} 

end if 



Computational complexity: As we can see from Table III the computational complexity of each 
user in constructing the optimal policy is very small. In each time slot, each user only needs to compute 
iV indices {aj(t)}j & ^f, and N normalized values {r^(t)} je ^, all of which are determined by analytical 
expressions. In addition, although the original definition of the policy requires each user to memorize the 
entire history of distress signals, in the actual implementation, each user only needs to know the current 
distress signal y l and memorize N normalized values {r'j(t)}j & j\f. 

VI. Extensions 

A. Users Entering and Leaving the Network 

We consider the scenario where users enter and leave the network. With users entering or leaving, the 
current operating point should change with the number of users. In general, there may be a convergence 
process to the new spectrum sharing policy and the new operating point as in [18]. However, as we 



will show later, one nice property of the proposed policy is that, the algorithm in Table III to determine 
the active user can be adjusted on the fly without a convergence process. Specifically, when a user 
comes or leaves, we just update a few parameters in the algorithm, and starting from the next time 
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slot, the subsequent transmit schedule determined by the updated algorithm is the "right" schedule, 
namely the schedule that guarantees the minimum throughput requirements of existing PUs and SUs while 
maintaining their energy efficiency. This capability of instant adjustment results from the structure of the 
algorithm: it schedules the transmission according to normalized future throughput. As long as a user's 
normalized future throughput remains unchanged, it can achieve the minimum throughput requirement 
with the same energy efficiency regardless of the entry and exit of PUs/SUs. 

The proposed procedure to deal with the entry and exit of PUs/SUs can also be implemented in two 
approaches depending on whether there is a LSS or not. Without the LSS, the users that leave the network 
will notify the existing users of its departure. The incoming users need to request the existing users for 
admission. With the LSS, the LSS can play the role of adjusting to the entry and exit of PUs/SUs 
similar to that in [18]: it determines whether an incoming user can enter the network, and update some 
parameters in the users' algorithms. Table [V] describes in details how the parameters in the algorithm 
should be updated to cope with PUs/SUs entering and leaving. In the following, we describe the update 
procedure, give intuition of why the procedure works, and prove desirable properties of this framework. 

When a user leaves the network, we could either reallocate its transmission opportunities to the 
remaining users, or change nothing in the algorithm by pretending that the user is still in the network. 
The first approach makes sure that the spectrum is utilized all the time, while the disadvantage is that 
some parameters in the algorithm need to be updated, which slightly increases the complexity of the 
algorithm. In this paper, we choose the first approach for spectrum and energy efficiency. Note that we 
could also modify the update procedure such that nothing is updated when a user leaves. In this case, 
although the spectrum is temporarily under-utilized, it will be fully utilized again when a new user enters. 

If some users requests to enter, the current operating points need to be changed, in order to create 
transmission opportunities for the incoming users. A rule of thumb is that PUs' operating points should 
remain intact, such that their minimum throughput requirements and energy consumptions remain the 
same. However, we need to reduce the transmission opportunities of the existing SUs to accommodate 
the incoming user. 

The following theorem proves that, with the proposed update procedure in Table [Vj the average 
throughput and energy efficiency of the existing users can be maintained with users entering and leaving. 

Theorem 3: The spectrum sharing policy with the update algorithm in Table [V] ensures that with 
PUs/SUs entering and leaving the network, each user's minimum throughput requirement is still achieved 
with an equal or smaller energy consumption. 
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TABLE VI 

Comparison of Design Frameworks For Selfish and Obedient Users. 





Conditions 


Boundary 


Algorithm 


Amount of initial information exchange 


Obedient 


Condition 2,3 (5 - 


a. = 0,Vi 


by = — oOjVijj 


M + N 


Selfish 


Condition 1,2,3 (8 > M ^^ 1 ) 


u,. > 0, Vi 


6y £ (— oo, 0), Vi, j 


(M + N) 2 + (Af + N) 



Proof: See [?, Appendix E]. 



B. Obedient Users 



Obedient users will follow the spectrum sharing policy, as long as their minimum throughput require- 
ments are achieved. Hence, we can just set the benefit from deviation as bij = — oo for all i, j G M. Uj\f. 



We summarize the differences in the design frameworks for selfish users and obedient users in Table VI 
First, the sufficient conditions for feasible operating points are reduced to Conditions 2 and 3. Second, 
the boundaries of the feasible operating points fi. become zero. In other words, the operating points fj 
can be arbitrarily large. Third, in the algorithm to compute the spectrum sharing policy, since bij = — oo, 
the terms related to bij vanish, which makes the algorithm simpler. Moreover, the information exchange 
is reduced to 27V, because the information exchanged are the minimum throughput requirements and the 
optimal operating points. 

VII. Performance Evaluation 

In this section, we demonstrate the performance gain of our spectrum sharing policy over existing 
policies, and validate our theoretical analysis through numerical results. Throughout this section, we 
use the following system parameters by default unless we change some of them explicitly. The noise 
powers at all the users' receivers are 0.05 W. For simplicity, we assume that the direct channel gains 
have the same distribution ga ~ £A/"(0, 1), Vi, and the cross channel gains have the same distribution 
gij ~ C\/V~(0, a), Vi / j, where a is defined as the cross interference level. The channel gain from each 
user to the LSS also satisfies goi ~ CJ\f(0, 1), Vi. The interference temperature threshold is / = 1 W. The 
measurement error e is Gaussian distributed with zeros mean and variance 0.1. The energy efficiency 
criterion is the average transmit power of each user. The discount factor is 0.95. 

A. Comparisons Against Existing Policies 

First, assuming that the population is fixed, we compare the proposed policy against the optimal 
stationary policy in ifTTTl lfT8ll and adapted versions of the punish-forgive policies in |[T9l - |[22l . which are 
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described as follows. 

• The optimal stationary policy: each user transmits at a fixed power level that is just large enough 
to fulfill the throughput requirement under the interference from other users. 

• The adapted stationary punish-forgive (SPF) policy: the punish-forgive policies in |[T9l - ll2TI were 
originally proposed for network utility maximization problems (e.g. maximizing the sum throughput). 
We adapt the SPF policies to solve the energy efficiency problem in ([7]). The SPF policies are dynamic 
policies that have two phases. When the users have not received the distress signal, they transmit at 
optimal stationary power levels. When they receive a distress signal that indicates deviation, they 
switch to the punishment phase, in which all the users transmit at the Nash equilibrium power levels. 
In the energy efficiency formulation, the optimal stationary power levels are the Nash equilibrium 
power levels. Hence, the adapted SPF policy is essentially the same as the optimal stationary policy. 

• The adapted nonstationary punish-forgive (NPF) policy: the punish-forgive policy in 11221 is different 
from those in 11191 - 11211 . in that nonstationary power levels are used when the users have not received 
the distress signal. In the simulation, we adapt the NPF policy in 11221 such that the users transmit 
in the same way as in the proposed policy when they have not received the distress signal. 

Since the adapted SPF policy is the same as the optimal stationary policy, we refer to the adapted NPF 
policy as the "punish-forgive" policy. 

1 ) Illustrations of Different Policies: We first illustrate the three different policies in terms of the users' 



transmit power levels, and their discounted average energy consumption and throughput in Table VII 



Consider a simple example of two users with minimum throughput requirements as 1 bits/s/Hz and 
2 bits/s/Hz. The direct channel gains are fixed to 1 and the cross channel gains are fixed to 0.5. 

In the optimal stationary policy, user 1 and user 2 transmit at fixed power levels 0.5 W and 0.9 W, 
respectively, at all time. Compared to the power levels in the proposed policy (0.15 W and 0.75 W, 
respectively), the power levels in the stationary policy are much higher due to the multi-user interference. 
Hence, the average energy consumptions of the stationary policy are also higher. 

In the punish-forgive policy, the users transmit at the same low transmit power levels as in the proposed 
policy (0.15 W and 0.75 W, respectively) alternatively before they receive the distress signal at time slot 
3. Since a distress signal is broadcast at the time slot in which user 1 is transmitting, it indicates that 
user 2 may have deviated. In the punish-forgive policy, the users transmit at the high power levels 
(0.5 W and 0.9 W, respectively) as in the optimal stationary policy. Hence, the users' average energy 
consumptions also increase, and will converge to the same levels as in the stationary policy (0.5 W and 
0.9 W, respectively). On the contrary, in the proposed policy (for selfish users), they still transmit in a 
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TABLE VII 
Illustrations of Different Policies. 







t = 


t = 1 


t = 2 


t = 3,y 3 = 1 


t = 4 


t = 5 


t = 6 


steady-state 


Stationary 


power level 
throughput 
energy 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5,0.9) 

(1,2) 
(0.5,0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5,0.9) 

(1,2) 
(0.5,0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5,0.9) 

(1,2) 
(0.5,0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


Adapted SPF 


power level 
throughput 
energy 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5,0.9) 

(1,2) 
(0.5,0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5,0.9) 

(1,2) 
(0.5,0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


(0.5,0.9) 

(1,2) 
(0.5,0.9) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


Adapted NPF 


power level 
throughput 
energy 


(0.15, 0) 
(2.00, 0) 
(0.15, 0) 


(0,0.75) 
(1.05, 1.89) 
(0.08,0.36) 


(0, 0.75) 
(0.74, 2.52) 
(0.06, 0.47) 


(0.15,0) 
(1.01, 1.99) 
(0.08, 0.37) 


(0.5,0.9) 
(1.00, 1.99) 
(0.14,0.46) 


(0.5, 0.9) 
(1.00, 1.99) 
(0.19, 0.51) 


(0.5,0.9) 
(1.00, 1.99) 
(0.22,0.55) 


(0.5, 0.9) 

(1,2) 
(0.5, 0.9) 


Proposed (selfish) 


power level 
throughput 
energy 


(0.15, 0) 
(2.00, 0) 
(0.15, 0) 


(0,0.75) 
(1.05, 1.89) 
(0.08,0.36) 


(0, 0.75) 
(0.74, 2.52) 
(0.06, 0.47) 


(0.15,0) 
(1.01, 1.99) 
(0.08, 0.37) 


(0.15,0) 
(1.16, 1.67) 
(0.09,0.31) 


(0.15, 0) 
(1.27, 1.46) 
(0.10, 0.27) 


(0.15,0) 
(1.34, 1.31) 
(0.10,0.25) 


N/A 
(1,2) 
(0.07, 0.37) 


Proposed (obedient) 


power level 
throughput 
energy 


(0.15, 0) 
(2.00, 0) 
(0.15, 0) 


(0,0.75) 
(1.05, 1.89) 
(0.08,0.36) 


(0, 0.75) 
(0.74, 2.52) 
(0.06, 0.47) 


(0.15,0) 
(1.01, 1.99) 
(0.08, 0.37) 


(0,0.75) 
(0.84, 2) 
(0.06,0.43) 


(0.15, 0) 
(0.99, 2) 
(0.07, 0.39) 


(0.15,0) 
(1.09, 2) 
(0.08,0.34) 


N/A 
(1,2) 
(0.07, 0.37) 



TDMA fashion with low power levels. As a punishment for user 2, user 1 will transmits in the first three 
time slots after receiving the distress signal, and user 2 has to wait for the opportunity to transmit until 
time slot 7. Since there is no multi-user interference, the average energy consumptions are lower than 
those in the punish-forgive policy. 

We also illustrate the difference between the proposed policy for selfish users and that for obedient 
users. The main difference lies in how they react to the distress signal (after t = 3). In the policy for 
obedient users, since the distress signal happens due to the erroneous measurement, instead of deviation, 
the punishment will not be triggered upon receiving the distress signal (i.e., user 2 transmits at t = 4). 
In contrast, the distress signal triggers the punishment in the proposed policy for selfish users (i.e., user 
1 transmits at t = 4). Due to the punishment triggered during the convergence process, the proposed 
policy for selfish users achieves the minimum throughput requirements at a slower pace, compared to 
the policy for obedient users (which achieves the minimum throughput requirements at t = 6). 

Finally, we can see that in the steady state, the energy consumption of the proposed policy is much 
lower than those in the other policies. 

2 ) Performance Gains: We compare the energy efficiency of the optimal stationary policy, the optimal 



punish-forgive policy, and the proposed policy under different cross interference levels in Fig. 6a We 
consider a network of two users whose minimum throughput requirements are 1 bits/s/Hz. First, notice 
that the energy efficiency of the proposed policy remains constant under different cross interference 
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- Static 

- Punish-forg 

- Proposed 







Number of users Minimum throughput (bils/s/Hz) 

(a) Different cross interference levels. (b) Different numbers of users. (c) Different minimum throughput require- 

ments. 

Fig. 6. Energy efficiency of the stationary, punish-forgive, and proposed policies under different system parameters. 



levels, while the average transmit power increases with the cross interference level in the other two 
policies. The proposed policy outperforms the other two policies in medium to high cross interference 
levels (approximately when a > 0.3). In the cases of high cross interference levels (a > 1), there 
is no stationary policy that can fulfill the minimum throughput requirements. As a consequence, the 
punish-forgive policies cannot fulfill the throughput requirements when a > 1, either. 

In Fig. |6bJ we examine how the performance of these three policies scales with the number of users. 
The number of users in the network increases, while the minimum throughput requirement for each user 
remains 1 bits/s/Hz. The cross interference level is a = 0.2. We can see that the stationary and punish- 
forgive policies are infeasible when there are more than 6 users. In contrast, the proposed policy can 
accommodate 18 users in the network with each users transmitting at a power level less than 0.8 W. 



Fig. 6c shows the joint spectrum and energy efficiency of the three policies. We can see that the optimal 
stationary and punish-forgive polices are infeasible when the minimum throughput requirement is larger 
than 1.6 bits/s/Hz. On the other hand, the proposed policy can achieve a much higher spectrum efficiency 
(2.5 bits/s/Hz) with a better energy efficiency (0.8 W transmit power). Under the same average transmit 
power, the proposed policy is always more energy efficient than the other two policies. 

In summary, the proposed policy significantly improves the spectrum and energy efficiency of existing 
policies in most scenarios. In particular, the proposed policy achieves an energy saving of up to 80%, 



when the cross interference level is large or the number of users is large (e.g., when a = 0.9 in Fig. 6a and 



when N = 7 in Fig. 6b ). These are exactly the deployment scenarios where improvements in spectrum 
and energy efficiency are much needed. In addition, the proposed policy can always remain feasible even 
when the other policies cannot maintain the minimum throughput requirements. 
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B. Adapting to Users Entering and Leaving the Network 

We demonstrate how the proposed policy can seamlessly adapt to the entry and exit of PUs/SUs. 
We consider a network with 10 PUs and 2 SUs initially. The PUs' minimum throughput requirements 
range from 0.2 bits/s/Hz to 0.38 bits/s/Hz with 0.02 bits/s/Hz increments, namely PU n has a minimum 
throughput requirement of 0.2 + (n — 1) * 0.02 bits/s/Hz. The SUs' have the same minimum throughput 
requirement of 0.1 bits/s/Hz. We show the dynamics of average energy consumptions and throughput of 
several PUs and all the SUs in Fig. [7] 

In the first 100 time slots, we can see that all the users quickly achieve the minimum throughput 
requirements at around t = 50. PUs have different energy consumptions because of their different 
minimum throughput requirements. The two SUs converge to the same average energy consumption 
and average throughput. There are SUs leaving (t = 100) and entering (t = 150, 250), and a PU 
entering (t = 200). We can see that during the entire process, the PUs/SUs that are initially in the 
system maintain the same throughput and energy consumption. The new PU (PU 11) has a higher energy 
consumption, because of its higher minimum throughput requirement (0.4 bits/s/Hz), and because of the 
limited transmission opportunities left for it. SU 3, however, does not need a higher energy consumption 
because it occupies the time slots originally assigned to SU 2, who left the network at t = 50. But SU 4 
does need a higher energy consumption, because there are more SUs and less transmission opportunities 
in the network after t = 250. 

VIII. Conclusion 

In this paper, we proposed nonstationary spectrum sharing policies that allow the PUs and SUs to 
transmit in a TDMA fashion. The proposed policy can achieve high spectrum efficiency that is not 
achievable by existing policies, and is more energy efficient than existing policies under the same 
minimum throughput requirements. The proposed policy can achieve high spectrum and energy efficiency 
even when the users have erroneous and binary feedback of the interference temperature. We extend the 
policy to the case with users entering and leaving the network, while still maintaining the spectrum and 
energy efficiency of the existing users. The proposed policy is amenable to decentralized implementation 
and is deviation-proof. Simulation results demonstrate the significant performance gains over state-of- 
the-art policies. 
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Time slot (t) Time slot (t) 



(a) Dynamics of average energy consumption. (b) Dynamics of average throughput. 

Fig. 7. Dynamics of average energy consumption and average throughput with users entering and leaving the network. At 
t = 0, there are 10 PUs and 2 SUs. SU 2 leaves at t = 100. SU 3 enters at t = 150. PU 11 enters at t = 200. SUs 4-8 enter 
at t — 250. We only show PUs 1, 5, 9, 11 (solid lines) and SUs 1, 2, 3, 4 (dashed lines) in the figure. 



Appendix A 
Proof of LemmaQ] 



Suppose that user j deviates from the TDMA policy by transmitting at a positive power level p- in 
time slot t and decreasing its power level by e* +s in time slot t + s. We derive the conditions under 
which this deviation allows user j to achieve its minimum throughput requirement with a lower energy 
consumption. 

To maintain the minimum throughput requirement, user j's deviation should satisfy 



I Pl9ij + <rj 



+ 5 t+s • log 2 { 1 + 



> 5 



t+s 



log 2 



• '' r, ! n \. do) 



To achieve high energy efficiency, user j should choose the decrease in its power level e t+s such that 



equality holds for the above inequality. Hence, the decrease in its power level e* +s can be calculated as 



3 9jj 



Pi9ij + °] 



Pj9jj +Pi9ij+v] 



Define A as the decrease in the energy consumption when user j deviates, namely 



A 



8 t+s p\ +s 



$p\ + 8 t+s (pf s - ef s ) 



6 l -l 



5S P] +S 9n+^ 
9jj 



P\9ij + 
J>'j9j.i ■ P'i9ij ■ 



Pi 



(11) 

(12) 
(13) 
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We examine the sign of A when p l - > 0. Taking the derivative of A with respect to p*-, we have 
dA 



dp 1 



3 



l' ! /"Hu + ^ ( Pi9ij + ^ 
P'j'W + P'fJU + Oj \Pj9jj + P\9ij + o) 



(14) 



When p* +s g,-,- < both / 3 J" 1 2 and Pi ^ 3 Z" j , 2 are strictly smaller than 1 when p\ > 0. 

Hence, J^- < when > 0. Since A = when p l - = 0, we have A < for p l - > 0. In summary, when 
p t ^~ s gjj < pf<7y, deviation (i.e. transmitting in user i's time slot t) increases the energy consumption. 

When p t j^ s gjj > Pidij, observe that ^\ p t =0 > 0. Since |^ is a continuous function of p 1 -, there exists 
some p] > such that > 0. The reason is as follows. Based on the continuity of J^f, we know that 
for any positive number £ > 0, there exists a £ > 0, such that J^j- — Jp-| p * = o < £ for any |j>* — 0| < £. 
Choose £ = 0|p5=o> we have < < 2 • 0| p *=o for any p*. € (-CO- Since > for some 
small positive p*- < A is increasing, and thus positive, for < (. Hence, when p t ;J +s gjj > p\gij, 
deviation (i.e. transmitting in user i's time slot t) decreases the energy consumption. 

In summary, the sufficient and necessary condition under which deviation decreases the energy con- 
sumption is p] +s gjj > pjgij. 

Appendix B 
Proof of TheoremQ] 

The proof culminates in the demonstration that under certain conditions, a set of Pareto optimal payoffs 
can be a self- generating set. Then according to IT361 Proposition 7.3.11 ll34l . all the payoffs in the set are 
equilibrium payoffs. More specifically, we derive the sufficient and necessary conditions (i.e. Conditions 1- 
3 in Theorem [T]) under which a subset of Pareto optimal payoffs is a self-generating set, and find the 
largest subset of Pareto optimal payoffs that can be self-generating (i.e. defined in Theorem [TJ). 

A. Preliminaries on Self-generating Sets 

We first provide some background knowledge related to the self-generating sets. Similar to Markov 
decision processes (MDP's), when we analyze the game, we can decompose the average payoff into 
the current payoff and the continuation payoff (i.e. the average payoff starting from the next time slot). 
However, there are two key differences between the decomposition in a game and that in a MDP First, 
there are multiple users in a game, as opposed to MDP's in which there is usually only one user. Second, 
the incentive compatibility constraints, which are not present in a MDP, need to be considered in a game. 
Hence, the decomposability in a game is defined as follows ES Definition 7.3.21 155lfl 

5 For the ease of reference, we duplicate the definition in | 36 Definition 7.3.2] here. 
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Definition 6 (Decomposability): A payoff v G R is decomposable on a set W C M w with respect 
to discount factor 5 and (pure) action profile p, if there exists a mapping 7 : Y — > W, such that for all 
i G TV, we have 

= (i-<y)-«i(p) + 5-X)'yi(i/)p(y|p) (is) 
> {i-5)-u i {p'^ l ) + 5-Y J ^{y)p(y\p'^-i)^Pi^v l . (16) 

3/ey 

A payoff v is decomposable on a set W with respect to discount factor 8, if there exists an action profile 
p, such that v is decomposable on a set W with respect to discount factor 8 and action profile p. 

In the above definition, we can see that each user i's payoff vi is decomposed into the current payoff 
Ui(p) and the expected continuation payoff YlyeY li{y) p(v\p) > which specifies the continuation payoff 
7i(y) starting from the next period given the signal y. Importantly, the decomposition needs to be incentive 
compatible, in the sense that each user i cannot choose a different action p\ to improve the average payoff. 
For convenience, we write ^(W; 8, p) as the set of payoffs that can be decomposed on set W with respect 
to discount factor 8 and action profile p, namely 

@(W; 8, p) = {v G M N : v is decomposable on set W with respect to 8 and p.} (17) 

Similarly, we write ^(W; 8) = U pe -p@(W; 8, p) as the set of payoffs that can be decomposed on set W 
with respect to discount factor 8. 

A self-generating set is a set W, in which every payoff v G W is decomposable on the set W itself. 
The formal definition is as follows [ 36 , Definition 7.3.4][34|. 

Definition 7 (Self- generating Sets): A set W is self-generating under discount factor 8, if W C @(W; 8). 

The self-generating sets play an important role in repeated game theory, because every payoff in a 
self-generating set is an equilibrium payoff. We restate this important result formally in the following 
lemma EH Proposition 7.3.1]|34). 

Lemma 2 (Self-generation): For any bounded set W C M. N , if W is self-generating, then every payoff 
in W is an equilibrium payoff of the repeated game. 

B. Outline of The Proof 

In the above subsection, we have summarized some important results related to self-generation in 
repeated game theory. Now we outline the proof of Theorem [T] 
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Recall that due to Definition ??, the Pareto boundary of the considered repeated game is 

B 

Consider a subset of the Pareto boundary 



J v : ^ ? = 1, v i > °> Vi £ W > 



'"' 1, ^>Mi, Vi€WL (18) 



where /Xj > for all i € M. Our focus is to show that under certain conditions, the subset of the Pareto 
boundary B^ can be a self-generating set, which means that every Pareto optimal payoff in £> M can be an 
equilibrium payoff. In the next subsection, we derive the necessary conditions if B^ is self-generating. 
These necessary conditions lead to Conditions 1-3 in Theorem [1] A byproduct of the first necessary 
condition are the constraints on the boundary /i of the self-generating sets £> M (i.e. the lower bound [i of 
/x in Theorem [Tj), which leads to the characterization of the largest possible self-generating set £> M . In the 
final subsection, we show that these necessary conditions are also sufficient for £> M to be self-generating. 



C. Necessary Conditions For a Set of Pareto Optimal Payoffs To Be Self-generating 

Suppose that B^ is self-generating. Then for any payoff v £ B^, there exists an action profile p and 
a mapping 7 : Y — > B^, such that for all i £ M, we have 

vt = (i-5)- Ui (p) + 5-Y J ^(y)p(y\p) (19) 

y&Y 

> {i-5)-u i { P ' i ^ l ) + 5-Y J ii{y)p(y\p^-i)^p'i^v i . (20) 

y&Y 

The first observation is that the action profile p that decomposes a Pareto optimal payoff v £ B^ must be 
a payoff-maximizing action profile for a certain user. In other words, p £ {p 1 , . . . , p^}. This is because 
the average payoff v and the continuation payoffs 7(2/), Vy £ Y, are all on the Pareto boundary B. In 
other words, 'Yln^Kf Vi/vi = 1 and J2ieAf n(y) I '^i = l 3 Vy € Y. Since the average payoff is the convex 
combination of the current payoff and the expected continuation payoff, the current payoff must also lie 
on the Pareto boundary, i.e. X^eJV^P)/^ = ^ According to Definition ??, the only action profiles 
that lie on the Pareto boundary are p 1 , . . . , p^. 

Based on the above observation, we have £F(W; 5) = Uj e ^^(W; 5, p l ). Suppose that a payoff v £ B^ 
is decomposed by p\ namely v £ £F(W; 5, p l ). Using the facts that Uj(p*) = and Uj(p l ) = 0, Vj 7^ i, 
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we have 

Vi = (i-6)-v i + s-J2"fi{y)p(y\p i ) (2i) 

y&Y 

and for all j ^ i, 

«i = ^E^M^) ( 22 ) 

> (i - 5) • wj-Cpj.pLj-) + ^ • ij(.v)p(y\pj,p~j), ^Pj e p?- 

Since user jf / i chooses p*- = in action profile p\ we say that under action profile p\ user i is the 
active user and user j ^ i is an inactive user. 

Next, we show that the incentive compatibility constraints for inactive users and the active user imply 
Condition 1 and Condition 2 of Theorem [TJ respectively. The incentive constraints for inactive users also 
give us constraints on the boundary /x of B^. In addition, to make sure that j(y) 6 Z3^,Vy, the discount 
factor should satisfy Condition 3 of Theorem [T] 

I ) Incentive Constraints For Inactive Users: We examine the incentive compatibility constraint for an 
inactive users j ^ i in ( |22| ), which will lead to the first necessary condition. First, since Uj(pj, p!_ ■) > 



0,\/pj > 0, for the inequality in <|22]> to hold, we must have Y, y & 7j(2/)p(y|p 4 ) > 12 y eY n fj(y)p(y\Pj} P-j)> 
which is equivalent to 

[pG/oIp*) - p(yobi, pi,-)] • (7i(yo) - 7i(yi)) > o, v Pj > o. (23) 

Note that the probability of receiving distress signals given action profile (pj,p l _j) is no smaller than 
the probability given p\ because 

p(yo\Pj,pij)- p(yo\p i )= l_ K9 '° f e (x)dx>o. (24) 

J l-p\9io-Pjgjo 

Since p(yo\pj, P-j) > p(yo|p l )> we must have jj(yi) > Jj(yo)- This requirement is intuitive: we should 
set a lower continuation payoff following the distress signal yo in order to deter user j 7^ i from deviating 
from p*. 



From the equality constraint in (22i, we have 



6 = ^ h t i-iv (25) 

l^yeY ij{y)p(y\p) 
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Plugging in the above expression of S, we can eliminate discount factor 5 in the inequality of (22 \ and 
obtain an equivalent inequality as follows 



1 



p(y\p l ) + 



:p(y\pj,p-j)) 



< Vj , v Pj ^p). 



For notational simplicity, we write the coefficient of Jj(yi) in the above inequality as 



<Hj{Pj,plj] 



Uj(Pj,P-j)J ".iU'j-P j) 



p(yi\p l ) + vj 



p{yi\p l ) + vj 



p(yi\pj,p-j) - p{yi\p l ) 

uj(Pj,pLj) 
p{yo\p l ) - p(yo\pj,p-j) 



«i(Pi,pLj) 



and define the maximum value of the coefficient Cjj as 



max CjiPj-P j) 

l>.( I ■!> •'/< 

p{y\\p % ) + Vj • max 



p{yo\p l ) - p(yo\pj,P-j) 



(26) 



(27) 
(28) 
(29) 

(30) 
(31) 



(32) 



(33) 



Uj{PjlP-j) 

Since jj(yi) > 7j(yo)> the set of inequality constraints in p2) 

cij(Pj,P-j) ■ 7i(yi) + (i - cij(pj,p-j)) ■ 77(2/0) < vj, 

for all pj > 0, is equivalent to a single constraint 

4j ■ Ti(2/i) + (1 - cj) " 7i(lA>) - v r 
Hence, the incentive constraints ( [22] ) for user j can be rewritten as 

p{y\\p l ) ■ ij(yi) + (i - p(yi\p 1 )) ■ 77(2/0) = f 
4 • 77(2/1) + (i - 4j) • 7j(l/d) < V* 

where • Vj < jj(y) < Vj,\/y G Y\ 

The first necessary condition of B^ C ^(B^; 5) is < 0, as stated in the following proposition. 
Proposition 2: If C ^(^; 5), then c+- < for all i 6 N and for all j / i. 

Proof: If C &{B^;S), then any payoff v in £> M should satisfy v G &(B^5). Pick a payoff v\ 



(34) 



in which 



(35) 
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Note that v l is the payoff profile in which every user j ^ i has the smallest payoff fij • Vj and user i has 
the largest payoff ^1 — Ylk^i f^k) ' ^i- We show that v* G 5) implies c[- < for all j ^ i. 

First, v* can only be decomposed by p*. Otherwise, suppose that v* is decomposed by p J , j ^ i. Then 
the decomposition of user i's payoff is 

v\ = 5- (p(yi\p J ) • 7*(l/i) + (1 - P(yi|p J )) • 7i(yo)) • (36) 

Since the convex combination of 7i(yi) and 77(2/1) is equal to t)|/(5, which is strictly larger than vj, at least 
one of 7i(yi) and 77(2/1) is strictly larger than u|. However, 7f(y) 6 i3 M implies that 7i(y) < v\, Vy G K, 
which leads to contradiction. Hence, v J can only be decomposed by p\ 

Now that v* is decomposed by p\ we focus on the incentive constraints for an arbitrary user j ^ i in 
( [34] >. From the equality in ( [34] ) and the requirement that 7/(2/1) > 77(2/0)1 we have 77(2/1) > Vj/S > 
Then suppose that > 0, in order to satisfy the inequality in ( [34] ), we must have 7/(2/0) < v], which 
is contradictory to the fact that 77(2/0) G ^V- Hence, we must have < for all j ^ i. 

Since the above argument of applies to any i G M, we have < for alii G and for all j 7^ i. 

m 

The first necessary condition that c\- < has two implications. First, since p(yi\p l ) and vj are both 
nonnegative, we have 

max — J — < 0, (37) 

where leads to Condition 1 in Theorem [T] that benefit from deviation bij < 0. 
Second, to decompose v\ we have 



p(yo\p l ) - p(yo\pj,p l -j) 

U1W " ; " 

iJPi&'j ",(Pr P' j) 

6,; 



c Tj = p{yi\p) + v j- jp ax ,_, .. ~i s — — ( 3§ ) 



PjePjffii&t iijip,. p' ,) 

p(2/i|P l ) + W-# (39) 

= Km|pO+A*r &ij ( 4 °) 
< 0, (41) 



which gives us a lower bound on \ij, namely 



. pjvAp 1 ) _ 1 - 1(2/0 1 p*) 
^ -b- ~ -b- ' 1 ; 



Since v* should be decomposed for all i G M, we have 

i-pG/oIp*) , a ~ 

li j > max , (43) 

which leads to the lower bound p. in Theorem [T 
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2) Incentive Constraints For The Active User: We examine the incentive constraints for the active 
user i in ( f2"Tj ), which will lead to the second necessary condition (i.e. Condition 2 in Theorem [1]). 

Suppose that a payoff v G is decomposed by p\ We rewrite the incentive constraint for the active 
user i here 



(44) 



{i-5)-v i + 5-Y J ii(y)p(y\v i ) 

> (l-5)-u i (p i ,pi i ) + 5-^2~f i (y)p{y\p i ,f> t _ i ), V Pi £Vi. 

y&Y 

Since 7(y) € £> M , given the inactive users' continuation payoffs 7j(y), the active user's continuation 



payoff is determined by 7i(y) 



-,. 1 1 _ V 7jfa) 



First, it is not difficult to check that if {7j(y)} J y i , Vy satisfy the inactive users' equality constraints in 



([34]), then 7i (y) 



(i-6)-v i + 5-J2'YMp(y\p i ) 

ydY 



will satisfy the active user's equality constraint in (|46J>. 



(l-6)-v i + 6-J2*i[ 1 -J2 



y&f yeY j^i 

sr 7i(y)p(y|p 4 ) 




p(y\p l 



V; 



5 ■ Vi 



j^i yeY 



Vj/6 



The inequality constraint in ( |46| ) requires that the active user i has no incentive to choose another action 
Pi ^ p\. Although the active user i's cuiTent payoff is maximized at p\ it may still have the incentive 
to deviate for the following reason. Since 7j(?/i) > 7j(yo) for all j ^ i, we have 7i(yi) < 7i(yo)- In 
other words, the active user i has a larger continuation payoff when the distress signal yo is received. 
Hence, it may want to deviate, such that the probability of receiving the distress signal is increased, if 
the increase of the expected continuation payoff outweighs the decrease of the current payoff. To prevent 
the active user i from deviating, we should make its continuation payoffs 7i(yi) and 7i(yo) as close 
as possible. Equivalently, we should make the inactive users' continuation payoffs 7j(yi) and 7j(yo) as 
close as possible. 

For an inactive user j / i, the closest continuation payoffs that satisfy the incentive constraints ( f34] > 
are the ones that satisfy the inequality with equality. Hence, we can solve for the continuation payoffs as 



7j(yi 



(l-c+)-(l-p( m |p*)) 



p(yi\f> 1 



Vj, 7j0o) 



p{yi\p r ) 



S c ij 



pG/iIp* 



(45) 
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Given the inactive users' continuation payoffs, we can obtain the active user's continuation payoffs 



7i(yi) and 7i(j/o)- Plugging the expression of 7j(yi) and jj(yo) into the inequality in ( [46] ), we have for 
all pi ^ p\, 

Vi>(l-S)- Ui(pi, pLJ + (5 • ^ 7i(y)p(y|Pi> P-i) 

2/eY 



4^ «i - (1 - 8) ■ Ui(j)i, p!_J - -5 • ^ «j [ 1 - ^ 



1j(v) 



P {y\Pi,vU)>K 



^ Vi - (1 - S) ■ mipi, p*_i) - 5 ■ 



Vj - Vj 



> 



(1 - S) ■ Vi - (1 - 6) ■ Ui (pi, pU) + (1 - 8) ■ Vi ■ 



p(yi\Pi,P-i) - C : 



p(yi|p*) - c; 



Vi 



> 



44> Vj — U 



44> Vj — U 



(P..pL,) + ^E Ki " IP " R,) ' 4 - a ^» 



p(yi\p % ) - c. 



Vj 



.(pi,p-i) + ui-2^ 1 + ~ 



p(yi|p l ) - $ 



> o 



Vi 
Vi 



+E 



v il v i 



44> Vj — u 



/- Mu- p{yi\Pup-i) - g(yi|p*) . n 



(p(yi|Pi,pii)-p(w|p i )) >o 



which leads to Condition 2 in Theorem Q] 

JJ Constraints On The Discount Factor: Now we derive the necessary conditions on the discount 
factor. The minimum discount factor 5(fj.) required for B^ to be a self-generating set can be solved by 

(46) 

Since ^(B^ 5) = VJj e j^&(B IJL ; 5, p l ), the above optimization problem can be reformulated as 

5(fi) = maxmin^, subject to v G ^iBn,; 6, p l ). (47) 



5(n) = max 5, subject to v £ ^(£>/x! 5). 



To solve the optimization problem ( |47] ), we explicitly express the constraint v G £F(£> M ; 5, p l ) using the 
results derived in the previous two subsections. The inactive users's continuation payoffs have been derived 



in ( |45l ), which determine the active user's continuation payoffs. Hence, the constraint v G ^(B,j,;8, p l 
on discount factor 5 is equivalent to 



j(y)€B^yeY, 



(48) 
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which can be written explicitly as 

7j(yi) r 



l 5 (l-ct j )-{l-p(y 1 \^)) 



p{yi\P) - 4i 



'J 



7j(yo) 



p(ift|p*) - k 



p(vi\p*) - 4 



Vj e [nj • Vj,Vj],Vj / i 



li{yi) = Vi [ 1 - I G ' u i> v i\ 



7t(yo) 



7j(yo) 



Since Jj(yi) > 7j(yo), the constraints on jj(yi) and 77(2/0) can be simplified as 

7j(yi) 



Ki-c^-a-KmlpO) 



^ <5> 



p(yi|p*) - c 

1 - c 



+ U7- 1 H^iIp 1 )-^ 



and 



7? (2/o) 



|(l-c+)-(l-p( yi |pO) 



pCs/iIp*) - c + 



^ >Hj-Vj. 



Note that the constraint ( |55| ) will be satisfied as long as cjj < 0. 

Since 7i(yi) < Ji(yo), the constraints on 7i(yi) and 7i(yo) can De simplified as 

7i(yi) > Pi • ^ £ > 



and 



7i(yo) < ^. 



"'JV* p ( yi ip 1 *) 



+ 



(49) 
(50) 

(51) 
(52) 



(53) 
(54) 



(55) 



(56) 



(57) 



Note that the above constraint on 7«(yo) is satisfied as long as (55 1 is satisfied for all j ^ i. Note also 



that the constraint ( |53| ) is satisfied as long as ( p6| ) is satisfied. 

To sum up, the discount factor needs to satisfy the following constraint: 



1 + ^ 



V± 1— /ij 

1— „. 
V - 



(58) 



/ p(ynlp') j 



Hence, the optimization problem ( 47 1 is equivalent to 



5(fJ,) = maxminij(v) 



(59) 
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where 

Xi(v) 



1 + 



l)j 1 — Mi 



Since Xj(v) is decreasing in and increasing in u, , Vj 7^ i, the payoff v* that maximizes minj e ^Xj(v) 
must satisfy x«(v*) = Xj(v*) for all i and j. Now we find the payoff v* such that Xj(v*) = Xj(v*) for 
all i and j. 

Define 2 = a. i=& = a. l=a w G A/". Then we can solve for ^ as follows 

(60) 



«i 1 + z 

Since X^ieJV ^ = !> we can solve for z as 

1 ~ SieAf Mi 



(61) 



Hence, the minimum discount factor is S(fi) = jt^, which leads to Condition 3 in Theorem IT 



D. Necessary Conditions Are Also Sufficient 

In the previous subsection, we have derived three necessary conditions for the set £> M to be self- 
generating. Now we show that the three necessary conditions are also sufficient for £> M to be self- 
generating. 

Given any payoff v G B^, we can determine the action profile p* that decomposes it and the 
corresponding continuation payoffs based on the results in the previous subsection. First, the action 
profile p* that decomposes v is determined by 

Vj 1 — Uj 

i = areminij v = argmax — — -, — p^-. (62) 

V] ~ ^k^j -b ]k 

Then we determine the continuation payoffs as 

Tifeo) = ♦"-^y %.Vi # i, ■ (63) 



7i(y) = «i U-E^^f 2 ,v yG y 



Conditions 1 and 2 ensure that the incentive constraints for the active user (21 1 and the inactive users 



( |22| > are satisfied by setting the continuation payoffs as above. Condition 3 on the discount factor 5 
ensures that the above continuation payoff 7(2/) G B^. Hence, any payoff v G £> M is decomposable on 
set Bfj_ with respect to discount factor 5 > S(fx). Then B^ is self-generating, and any payoff in B^ is an 
equilibrium payoff. 
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Appendix C 
Proof of Theorem[2] 

We have characterized the largest set of Pareto optimal equilibrium payoffs B^. In the algorithm in 
Table II, we start with the target payoff v* G B^ as the average payoff at period 0, and decompose it 
into a current payoff and a continuation payoff. The decomposition tells us what action profile to play in 
period 0. Then we decompose the continuation payoff and determine the action profile to play in period 
1. By performing the decomposition in every period, we can determine what action profile to play given 
any signal at every period. 

Specifically, suppose that the continuation payoff at period t is v(i). Then the action profile p l to 
decompose v(i) is determined by 

i* = arg min Xj (v (t) ) = arg max 1 ^ J , (64) 



m I f 

payoff v(t + 1) according to 



3 J x Vj ^k^j -b jk 

where — „. m 1-Atj — ; — ^r- is exactly user j's index a.j(t). Then we can determine the continuation 

Z^fc^j -6-,. 
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